- Home
- Search Results
- Page 1 of 1
Search for: All records
-
Total Resources4
- Resource Type
-
40000
- Availability
-
40
- Author / Contributor
- Filter by Author / Creator
-
-
Chandak, Yash (4)
-
Thomas, Philip (4)
-
Jordan, Scott (2)
-
Theocharous, Georgios (2)
-
Brunskill, Emma (1)
-
Castro da Silva, Bruno (1)
-
Kostas, James (1)
-
Learned-Miller, Erik (1)
-
Niekum, Scott (1)
-
Shankar, Shiv (1)
-
White, Martha (1)
-
#Tyler Phillips, Kenneth E. (0)
-
#Willis, Ciara (0)
-
& Abreu-Ramos, E. D. (0)
-
& Abramson, C. I. (0)
-
& Abreu-Ramos, E. D. (0)
-
& Adams, S.G. (0)
-
& Ahmed, K. (0)
-
& Ahmed, Khadija. (0)
-
& Akcil-Okan, O. (0)
-
- Filter by Editor
-
-
null (3)
-
& Spizer, S. M. (0)
-
& . Spizer, S. (0)
-
& Ahn, J. (0)
-
& Bateiha, S. (0)
-
& Bosch, N. (0)
-
& Brennan K. (0)
-
& Brennan, K. (0)
-
& Chen, B. (0)
-
& Chen, Bodong (0)
-
& Drown, S. (0)
-
& Ferretti, F. (0)
-
& Higgins, A. (0)
-
& J. Peters (0)
-
& Kali, Y. (0)
-
& Ruiz-Arias, P.M. (0)
-
& S. Spitzer (0)
-
& Sahin. I. (0)
-
& Spitzer, S. (0)
-
& Spitzer, S.M. (0)
-
-
Have feedback or suggestions for a way to improve these results?
!
Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
When faced with sequential decision-making problems, it is often useful to be able to predict what would happen if decisions were made using a new policy. Those predictions must often be based on data collected under some previously used decision-making rule. Many previous methods enable such off-policy (or counterfactual) estimation of the expected value of a performance measure called the return. In this paper, we take the first steps towards a universal off-policy estimator (UnO)—one that provides off-policy estimates and high-confidence bounds for any parameter of the return distribution. We use UnO for estimating and simultaneously bounding the mean, variance, quantiles/median, inter-quantile range, CVaR, and the entire cumulative distribution of returns. Finally, we also discuss UnO’s applicability in various settings, including fully observable, partially observable (i.e., with unobserved confounders), Markovian, non-Markovian, stationary, smoothly non-stationary, and discrete distribution shifts.more » « less
-
Kostas, James ; Chandak, Yash ; Jordan, Scott ; Theocharous, Georgios ; Thomas, Philip ( , Proceedings of Machine Learning Research)null (Ed.)
-
Chandak, Yash ; Shankar, Shiv ; Thomas, Philip ( , Proceedings of the AAAI Conference on Artificial Intelligence)null (Ed.)
-
Chandak, Yash ; Jordan, Scott ; Theocharous, Georgios ; White, Martha ; Thomas, Philip ( , Advances in neural information processing systems)null (Ed.)